A multi-scalar story of the diffusion of a new technology: the web


Emmanouil Tranos

University of Bristol, Alan Turing Institute
, @EmmanouilTranos, etranos.info

Contents

  • Introduction

  • Web data

  • Methods

  • Results

    • S-shaped diffusion curves
    • Rank dynamics
    • Spatial analysis
    • Modelling
  • Conclusions

Introduction

Aim

  • Diffusion of a new technology: the web
  • Geographers used to be interested in diffusion
  • Hägerstrand et al. (1968)
  • Passed the torch to economists and sociologists
  • Why? Lack of granular data:

Because new digital activities are rarely—if ever—captured in official state data, researchers must rely on information gathered from alternative sources (Zook and McCanless 2022).

Importance

  • Guide policies for deployment of new technologies

  • Predictions of introduction times for future technologies (Meade and Islam 2021):

    • Network operators

    • Suppliers of network equipment

    • Regulatory authorities

Technological diffusion


Spatial diffusion processes

  • As in temporal diffusion models, an S-shaped pattern in the cumulative level of adoption

  • A hierarchy effect: from main centres to secondary ones – central places

  • A neighborhood effect: diffusion proceeds outwards from innovation centres, first “hitting” nearby rather than far-away locations (Grubler 1990)

Hägerstrand (1965): from innovative centres (core) through a hierarchy of sub-centres, to the periphery

Diffusion of a new digital technology


  • Diffusion of an intangible/digital technology

  • Map the active engagement with the digital

  • Over time, early stages of the internet

  • Granular and multi-scale spatial perspective

Web data

Long story short

  • Data from the Internet Archive, the oldest web archive

  • Observe commercial websites 1996 - 2012 in the UK (.co.uk)

  • Geolocation: postcode references in the text

  • Timestamp: archival year

  • Counts

Web data: The Internet Archive

Web data: The Internet Archive

Long story short

  • Data from a Web Archive – The Internet Archive

  • Observe commercial websites 1996 - 2012 in the UK (.co.uk)

  • Geolocation: postcode references in the text

  • Timestamp: archival year

  • Counts

Web data: The Internet Archive

  • The largest archive of webpages in the world
  • 273 billion webpages from over 361 million websites, 15 petabytes of storage (1996 -)
  • A web crawler starts with a list of URLs (a seed list) to crawl and downloads a copy of their content
  • Using the hyperlinks included in the crawled URLs, new URLs are identified and crawled (snowball sampling)
  • Time-stamp

Our web data

  • JISC UK Web Domain Dataset: all archived webpages from the .uk domain 1996-2012

  • Curated by the British Library

  • Tranos, E., and C. Stich. 2020. Individual internet usage and the availability of online content of local interest: A multilevel approach. Computers, Environment and Urban Systems, 79:101371.

  • Tranos, E., T. Kitsos, and R. Ortega-Argilés. 2021. Digital economy in the UK: Regional productivity effects of early adoption. Regional Studies, 55:12, 1924-1938.

  • Stich, C., E. Tranos and M. Nathan. 2022. Modelling clusters from the ground up: a web data approach. Environment and Planning B, in press.

  • Tranos, E., A. C. Incera and G. Willis. 2022. Using the web to predict regional trade flows: data extraction, modelling, and validation, Annals of the AAG, in press.

Our web data

  • All .uk archived webpages which contain a UK postcode in the web text

  • Circa 0.5 billion URLs with valid UK postcodes



20080509162138/http://www.website1.co.uk/contact_us IG8 8HD

Data cleaning

Unique postcodes frequencies, 2000

level freq perc cumfreq cumperc
(0,1] 41,596 0.718 41,596 0.718
(1,2] 6,451 0.111 48,047 0.830
(2,10] 6,163 0.106 54,210 0.936
(10,100] 2,975 0.051 57,185 0.988
(100,1000] 646 0.011 57,831 0.999
(1000,10000] 62 0.001 57,893 1.000
(10000,100000] 4 0.000 57,897 1.000


  • Websites with a large number of postcodes: e.g. directories, real estate websites

  • Focus on websites with one unique postcode per year

Directory website with a lot of postcodes

Website with a unique postcode in London

Methods

Reminder: diffusion mechanisms

  • S-shaped pattern in the cumulative level of adoption

  • A hierarchy effect: from main centres to secondary ones

  • A neighborhood effect: first “hitting” nearby locations

Methods

  • Model cumulative adoption

  • Descriptive statistics, ESDA & density regressions

  • Modelling framework

  • Two scales:

    • websites per firm in a Local Authority
    • websites in an Output Area

S-shaped diffusion curves

Diffusion speed


  • Spatial heterogeneity

  • Not a clear, easy to explain pattern

Rank dynamics: stability vs. volatility


  • Adoption heterogeneity

  • Different perceptions of risk and economic returns from new technologies

  • Early adopters vs. laggards, leapfrogging

Rank dynamics and diffusion speed

  • Spatial heterogeneity

  • Expected volatility

Spatial mechanisms

Neighbourhood effect: diffusion proceeds outwards from innovation centers, first “hitting” nearby rather than far-away locations (Grubler 1990)

  • Spatial dependency (Moran’s I & LISA maps)

  • Website density regressions – distance effect

  • Websites per firm in Local authorities (c. 400)

  • Websites in Output Areas (c. 200,000)

Website density regressions


\[Website\,Density_{i} = a + \beta Distance\,to\,Place_{i} + e_{i}\]


\(Website\,Density_{i}\):

  • Websites per firm in a Local Authority \(i\), or

  • Websites in an Output Area \(i\)

Website density regressions


\[Website\,Density_{i} = a + \beta Distance\,to\,Place_{i} + e_{i}\]


\(Place\):

  • London, or

  • Nearest city, or

  • Nearest retail centre

Website density regressions


\(\beta\) interpretation:

  • The lower the \(\beta\) is (or the larger the \(|\beta|\) is)…

  • … the larger urban gravitation is for web adoption.

Neighbourhood effect

  • Spatial dependency
    • Relatively small, consistent over time / scales
    • London hot spot early on
    • At local scale, consistent hotspots over time
    • Granular analysis reveals other hotspots
  • Distance effect
    • Urban gravitation increases over time and then drops
    • Granular analysis: gravitation
    • Lost explanatory power over time


Hierarchy effect: from main centers to secondary ones – central places

  • Gini coefficient

Hieararchy

  • Almost perfect polarisation of web adoption in the early stages at a granular level

  • Polarisation decreases over time

  • More equally diffused at the Local Authority level

Putting all of these together

Modelling framework

  • Random forests to predict \(Website\,Density_{i,t}\)

  • 4 sets of models:

    • All the data ⟹ variable importance
    • Train on one region, test on the rest ⟹ spatial differences and similarities of diffusion mechanisms
    • For Local Authorities and Output Areas
  • Space-time sensitive 10-fold CV (CAST)

Models trained on all data


RMSE RSquared MAE
Local Authorities 0.022 0.908 0.012
Output Areas 3.613 0.627 0.533

Variable importance for Local Authorities

Variable importance for Output Areas

Regional similarities for Local Authorities

Regional similarities for Output Areas

Conclusions

  • Established technological diffusion drivers still apply

    • for a digital technology
    • at local scales
  • Geography matters: spatial dependency, urban gravitation

  • Some indications of a hierarchical diffusion

  • Granular analysis reveals patterns otherwise not visible

  • Stability and volatility: leapfrogging, early adopters dropping, but also stable positions

  • Spatially consistent mechanisms at local scale

  • Heterogeneity increases with resolution

References

Grubler, Arnulf. 1990. The Rise and Fall of Infrastructures: Dynamics of Evolution and Technological Change in Transport. Physica-Verlag.
Hägerstrand, Torsten. 1965. “A Monte Carlo Approach to Diffusion.” European Journal of Sociology/Archives Européennes de Sociologie 6 (1): 43–67.
Hägerstrand, Torsten et al. 1968. “Innovation Diffusion as a Spatial Process.” Innovation Diffusion as a Spatial Process.
Meade, Nigel, and Towhidul Islam. 2021. “Modelling and Forecasting National Introduction Times for Successive Generations of Mobile Telephony.” Telecommunications Policy 45 (3): 102088.
Zook, Matthew, and Michael McCanless. 2022. “Mapping the Uneven Geographies of Digital Phenomena: The Case of Blockchain.” The Canadian Geographer/Le Géographe Canadien 66 (1): 23–36.